Moderate diversity for better cluster ensembles

نویسندگان

  • Stefan Todorov Hadjitodorov
  • Ludmila I. Kuncheva
  • Ludmila P. Todorova
چکیده

Adjusted Rand index is used to measure diversity in cluster ensembles and a diversity measure is subsequently proposed. Although the measure was found to be related to the quality of the ensemble, this relationship appeared to be non-monotonic. In some cases, ensembles which exhibited a moderate level of diversity gave a more accurate clustering. Based on this, a procedure for building a cluster ensemble of a chosen type is proposed (assuming that an ensemble relies on one or more random parameters): generate a small random population of cluster ensembles, calculate the diversity of each ensemble and select the ensemble corresponding to the median diversity. We demonstrate the advantages of both our measure and procedure on 5 data sets and carry out statistical comparisons involving two diversity measures for cluster ensembles from the recent literature. An experiment with 9 data sets was also carried out to examine how the diversity-based selection procedure fares on ensembles of various sizes. For these experiments the classification accuracy was used as the performance criterion. The results suggest that selection by median diversity is no worse and in some cases is better than building and holding on to one ensemble. 2005 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Selecting Diversifying Heuristics for Cluster Ensembles

Cluster ensembles are deemed to be better than single clustering algorithms for discovering complex or noisy structures in data. Various heuristics for constructing such ensembles have been examined in the literature, e.g., random feature selection, weak clusterers, random projections, etc. Typically, one heuristic is picked at a time to construct the ensemble. To increase diversity of the ense...

متن کامل

Diversifying Heuristics for Cluster Ensembles

Cluster ensembles are deemed to be better than single clustering algorithms for discovering complex or noisy structures in data. We consider different heuristics to introduce diversity in cluster ensembles and study their individual and combined effect on the ensemble accuracy. Our experiments with three artificial and three real data sets, and 12 ensemble types, showed that the most successful...

متن کامل

Cluster Ensemble Selection

This paper studies the ensemble selection problem for unsupervised learning. Given a large library of different clustering solutions, our goal is to select a subset of solutions to form a smaller but better performing cluster ensemble than using all available solutions. We design our ensemble selection methods based on quality and diversity, the two factors that have been shown to influence clu...

متن کامل

Genetic diversity assessment in physic nut (Jatropha curcas L.)

Mahalanobis’ D-square (D2) statistics was applied to assess diversity in the 9 genotypes collectedof semi-arid region of India (7 genotypes from Gujarat and Rajasthan for normal toxic and two fromOrissa csmcri’s plantation of non toxic nature. These genotypes were grouped into five. Cluster I andIII had two genotypes, cluster II had three genotypes and cluster VI and V contributed as solitaryge...

متن کامل

Diversity-Based Weighting Schemes for Clustering Ensembles

Clustering ensembles has been recently recognized as an emerging approach to provide more robust solutions to the data clustering problem. Current methods of clustering ensembles typically fall into instance-based, cluster-based, or hybrid approaches; however, most of such methods fail in discriminating among the various clusterings that participate to the ensemble. In this paper, we address th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Information Fusion

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2006